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Abstract 

Recent work in Causal Reasoning has provided practical techniques for multiple 
fault diagnosis. These techniques provide for a hypothesis/measurement diagnosis 
cycle. Using probabilistic methods, they choose the best measurements to make, 
then update fault hypotheses in response. 

For many applications such as computers and spacecraft, few measurement points 
may be accessible, or values may change quickly as the system under diagnosis 
operates. In these cases, a hypothesis/measurement cycle is insufficient. This paper 
presents a technique for a hypothesis/test-input/measurement diagnosis cycle. In 
contrast to generating tests a priori for determining device functionality, it dynam- 
ically generates tests in response to current knowledge about fault probabilities. 
The paper shows how the mathematics previously used for measurement specifica- 
tion can be applied to the test input generation process. An example from an effi- 
cient implementation called MPC is presented. 

I. Introduction 

In recent years, AI techniques have proven useful for constructing fault diagnosis 
tools. A particularly interesting subset of these techniques is based on Causal Rea- 
soning. Causal reasoning tools use a model of how the unit should behave, assum- 
ing no faults. This model can then be used to infer possible faults by comparing 
the observed behavior to the behavior predicted by the model. 

Recent work has yielded techniques for multiple fault diagnosis using such tech- 
niques. The approach of (de Kleer and Williams), for example, provides an efficient 
framework for hypothesizing faults, given a set of measurements. At each step in 
diagnosis, it determines the most helpful measurement to make next. This provides 
a hypothesis/measurement diagnosis cycle. 

A fault-isolation procedure based solely on a hypothesis/measurement cycle is often 
insufficient. Many complex systems such as computers and spacecraft are packaged 
in a way that makes measurement of most internal points time-consuming and ex- 
pensive. Additionally, values within the system may not be static. For example, in 
a microprocessor-based system, data is sent over the databus to peripheral devices. 
Without sophisticated test equipment, the actual databus value cannot be measured 
directly, because it is present for only a fraction of a microsecond. For these rea- 
sons, it is often preferable to restrict measurement, when possible, to a few easily- 
accessible points. In this style of diagnosis, multiple test inputs are generated to 
observe the system in multiple states, rather than multiple measurements taken 
with the system in a single state. Recently, researchers have introduced off-line 
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techniques for purposes such as post-assembly checkout (Shirley). However, little 
has been presented about dynamically generating tests for isolating multiple faults. 

The following sections present a technique for adapting the mathematics of 
hypothesis/measurement techniques for performing dynamic test input generation 
for multiple-fault isolation. Section II describes the causal models for which this 
technique applies. Section III presents the approach to the test generation and se- 
lection processes, and the techniques adopted for deducing the most probable 
faults. Finally, Section IV presents an example diagnosis using an implemented 
test-generation/fault-isolation system. 

II. Causal Models for Multiple-Fault Diagnosis 

Central to the causal-reasoning scheme is a causal model describing how the sys- 
tem under diagnosis properly functions. The causal model contains a description of 
the important points within the system, referred to as elements, and how the val- 
ues of these elements cause and effect each other. Consider the system shown in 
Figure 1. It is a modified portion of an experiment control electronics subsystem 
concept being developed by Martin Marietta. This subsystem will be used as an ex- 
ample in the following sections. In the model, elements correspond to inputs and 
outputs of modules, as well as to t a few module-internal points. Such a system can 
be modeled with our DEFCAUSAL syntax. For example, one operation of the Re- 
mote Interface Unit (RIU) is to write data to port EXP-CMD-3 when a :write-3 
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Figure 1. Example system for diagnosis Part of experiment 
control electronics subsystem. Commands sent and data gathered 
via the TDRSS relay satellite. 






















command is received on BCU-CMD. This relation is shown below (values starting 
with a "?" are variables). 

(defcausal RIU 

(BCU-CMD (: write-3 ?data) :momentary) causes 
(EXP-CMD-3 ?data : continuing) ) 

Another causal relation describes how the On-Board Computer (OBC), when in the 
:experiment mode, sends commands on the Command Unit (CU) line. The com- 
mands are sent from the OBC command-sequence memory: 


(defcausal OBC 
(0BC_MEM ?memory) 

(MODE : experiment) 

(ELAPSED-TIME ?t) causes 

(CU-CMD (address ?t ?memory) momentary) ) . 


The :momentary and continuing flags indicate the temporal relations of the causes 
and effects. 


III. Dynamic Test Input Generation 

An important part of diagnosis is to gain knowledge about the internal state of the 
malfunctioning system. This knowledge helps to decide which among several fault 
hypotheses is actually correct. In systems with many measurable internal points, 
this knowledge is gained by direct measurement, e.g., with voltmeters, logic 
probes, etc. When internal points are inaccessible, we must adopt a different 
means of obtaining this information. Our approach is based on a concept of path 
generation. In this approach, paths through the system are generated, which, if 
tested, will yield information about the internal system state. Our approach incor- 
porates this idea with the following processes: (1) A Candidate Generator, which 
uses the measurements resulting from tests to produce fault hypotheses and their 
associated probabilities; (2) a Path Constructor, which suggests test paths through 
the model to gain information about the hypothesized faults; (3) a Path Selector, 
which chooses the path most helpful in discriminating between fault hypotheses; 
and (4) a Causal Planner, which produces a test input sequence to activate the se- 
lected path. Figure 2. presents a diagram of how these processes interact in a test- 
generation/fault-isolation system. 

Candidate Generation 

The Candidate Generator derives the fault hypotheses (candidates) implied by ob- 
served symptoms. It assigns probability estimates to each candidate. 

Upon input of a symptom (an unexpected value at an observed element), the 
Causal Planner determines which causal relations imply the correct value, rather 
than the symptom value, was to be expected. This set is known as the active rela- 
tions set. The Causal Planner produces this set as follows: given the correct out- 
puts as goals and the inputs which were present when the symptom was observed 



Figure 2. Test generation/ Diagnosis architecture. 


as constraints on the solution, it generates a plan indicating the required inputs. 
During planning, the planner selects a causal relation to achieve each subgoal. 
When a solution is found, the set of relations selected in planning comprises the 
active relations set. This set can be described as a directed graph, relating input 
values, through intermediate values, to the desired output value. Assume, for ex- 
ample, that a value of 35 is desired on the TDRSS-IN line of the system of Figure 
1. The resulting active relations set is shown in Figure 3. The active relations set 
describes the mechanisms which if functioning properly, would provide the correct 
behavior. When a failure symptom is observed, therefore, at least one of the active 
relations must not be functioning as specified. Conversely, if the correct output 
results, there is evidence that the active relations set is without fault. 

One useful technique for generating candidates based on causal information has 
been described by (de Kleer and Williams). The technique maintains a set of mini- 
mal candidates , each of which must be able to account for all observed symptoms. 
Probability values are assigned to each candidate, using Bayesian probability con- 
cepts. Candidate generation, in our approach describing faults in terms of faulty 
active relations, is taken directly from (de Kleer and Williams), so will not be re- 
peated here. A slight modification of the probability assignment approach is pre- 
sented here. 

A test passes if the value predicted by the model is observed at the test point. Af- 
ter each test is run, the probabilities of all candidates are updated, based on 
whether the test passes or fails. The probability that all of the relations in the 
candidate are faulty is assigned to each candidate. The probability of candidate Cj, 
with respect to its previous probability P(Cj) is, by Bayes* rule (if the test fails) 
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Figure 3- Active causal relations (ri) indicating why TDRSS-IN should equal 35. 

OBC memory is loaded with "read-4" in location 1. Then, in the "experiment" mode, 
when time=l, the OBC sequencer sends the command, which sends the result to TDRSS. 


P(Cj| fails) = P(failsl C.j) P(Cj) 
P(fai 1 s) 


where 

P(fails| Cj) =0 if Cj-fT active-relations 

1-r if Cj t> active-rel ations 

C j /I active relations 
P(fails) = £ P(cj)(1 - r) 
j 

and, if the test passes, 

P (C j I passes) = P(passes| Cj) P(Cj) 
P(passes) 


where 


P(passesi Cj) = 1 if C j /Tacti ve-rel ati ons 

r if Cjfl active-relations 


Cj f\ active relations Cj /^active relations 
P(passes) =£J P(Cj) r + P(Cj) 

J J 
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As reflected above, even if a candidate is the fault, the correct value will in some 
cases result at the test point under the conditions of the test. This effect is ap 

proximated above with a constant, r, which indicates the probability that a particu- 
lar relation will so behave. p 

Path Construction 

Path construction is the basis of our test generation approach. The purpose of 
path construction is to generate a path using causal relations from a point internal 
o t e system to a measurable point. By measuring the result at the output point 
evidence about the internal state will be obtained. P 

men generating a test, we wish to obtain knowledge about which active relations 

T_l .!• led f T i e test pat . h wU therefore traverse causal relations from the active 

relations set. For example, given the active relations of Figure 3, each test path 

wou pass through at least one of the relations rl through rlO. If an incorrect 

Z ZZT n,h 6 end °- J i Palh ' there e ” St faults «^ons traversed 

by the path. Otherwise, evidence indicates faults probably do not exist in the path. 

T £JZ™ ° f pa f t! ; construction is depicted in Figure 4. The causal relation ri 

SS th^ T t UeS , elements ei and causes value at elements ei+1. Assume 
that the relations r are not within the active relations set. If ei has the expected 



symptom point 


Figure 4. Path Construction for isolating possible fault of relation ri (see text). 


value, but ei+1 does not, n must be faulty. To discriminate between ri i and the re- 
lations rj>i, paths not passing through r j are needed. Paths through ri and r’i+I 
will therefore be constructed. Similarly, to discriminate between ri and rj<i, paths 
through r i-i and n are constructed. All of the constructed paths must terminate 

S ° that ValUfiS Can ^ ~ 

To this end, the path constructor generates all paths through relations ri of the 
c ive relations set, with and without alternative relations r\ terminating at mea- 

is Mtivated by a test ^ * — 


Path Selection 


The Path Selector chooses the path most useful to 
Planner as a test goal. 


test, and gives it to the Causal 
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The results of a particular test will give evidence about whether a fault exists 
within the active relations of the test. Therefore, to determine the usefulness of a 
path, it is necessary to know which causal relations need to be active to run the 
test. Figure 5 depicts this situation. The large triangle of Figure 5 represents the 
active relations which should have caused a correct value at the symptom-point. 
The small triangle represents the relations contributing to the expected values 
needed for the relation ri-i. If the path r’ were to be tested, the relations repre- 
sented by the large dashed triangle of Figure 5 would be under test, because they 
are all needed to activate the path r\ The way to determine this set of relations is 
to run the Causal Planner, as described in "Candidate Generation.” Given a large 
model and a large set of potential paths (a common occurrence), the time to evalu- 
ate all the options would be prohibitive (In the example model, the planner runs 
for about 5 seconds for each plan). An approximation is therefore used. The rela- 
tions within the small triangle are already known, because they were determined 



symptom 

point 


measurable 

point 


Figure 5. The active relations set for the test path is approximated by the relations 
comprising the small triangle union the relations forming the potential test path. 

in finding the active relations leading to the original symptom. The union of that 
set and the set of relations used to construct the potential path can therefore ap- 
proximate the active relations set. This approximation may skew path selection, by 
affecting the probability of the test passing, as indicated in the equations above. 
Fortunately, it will not affect accuracy of the test results, because the planner will 
later be run for the selected path to generate the test input. This will determine 
the exact active relations set. 

At this point, the potential paths, each with the set of causal relations it tests, 
have been determined. The final step in path selection is to choose the set of rela- 
tions most useful to test. Techniques similar to those presented in (de Kleer and 
Williams) apply to path selection. The essentials of the process are discussed here. 

The best test is defined as that which minimizes the expected entropy of the can- 
didate set, using the Information Theory definition of entropy 

candidates 

H = - 2 P(Cj) log P(Cj). 

J 
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As the probabilities move toward 0 or 1, this sum is minimized. The expected en- 
tropy resulting from a given test is 


H = H(passes) p(passes) + H(fails) p(fails). 


In terms of the definition of entropy, 
candidates 

H(passes) = P(Cj I passes) log P(Cj | passes) 

J 

~ candidates 

H(fails) = P(Cj I fails) log P(Cj | fails). 

J 

The conditional probabilities of each candidate are as given in "Candidate Gener- 
ation." The test which minimizes H is selected as the best test to run next. 

The final part of test generation is implemented by giving the Causal Planner the 
desired value of the measurable point (the output of the selected path) as a goal. 
The constraints on the plan are that the approximated active relations set is in- 
cluded in the solution. When the planner terminates, the plan produced is the test 
input. 


IV. MPC- An Implementation of Test Input Generation 

A computer program implementing the test-generation/fault-isolation architecture 
of Figure 2. has been implemented as part of the MPC (Multi-Purpose Causal) 
tool. It is implemented in Lisp on a Symbolics 3670. MPC accepts models described 
in the DEFCAUSAL syntax and currently has an interface requesting tests and ac- 
cepting measurement results. It has been tested on several models, including an 
expanded version of the example presented here. 

An example diagnosis session using MPC will now be described, indicating the op- 
eration of the various test-generation subsystems. Assume that the sequence of 
TDRSS commands (mode configure), (write 1, (write-3 35)), and (mode rexperiment) 
were sent to the system of Figure 1. As shown in Figure 3, a value of 35 on the 
EXP-CMD-3 control line would be expected. Assume that this value was not ob- 
served. MPC is therefore given EXP-CMD-3 as the initial symptom point. Ten can- 
didates are generated, one for each of the potentially faulty relations shown in Fig- 
ure 3. The candidate probabilities are as follows: 

(rl) = .100, { r2 } = .100, (r3) = .100, (r4) = .100, (r5) = .100, 

{r6} = .100, { r7 ) = .100, (r8) = .100, {r9} = .100, {rlO} = .100. 

Paths from the points associated with these candidates to measurable outputs 
(EXP-CMD-1 - EXP-CMD-4 and TDRSS-IN) are generated. The most useful path, 
according to the entropy-measurement equations described in "Path Selection," is 
the path from the BCU-CMD element to the measurable point EXP-CMD-1. This 
selection corresponds to the intuitive "divide the problem in half 1 approach often 
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used by technicians. To generate a test of this path, the causal planner is invoked, 
and is constrained to use the causal relations rl through r6 in the plan, as they 
were used to cause the point of interest BCU-CMD, as can be seen in Figure 3. 
The resulting plan is 


TDRSS-OUT = (mode : configure) 

TDRSS-OUT = (write 1, (write-1 35)) 
TDRSS-OUT = (mode : experiment) . 

MPC then prompts 

Is the value at EXP-CMD-1 equal to 35? 


Assume that the answer is "yes." The updated candidate probabilities are 

(rl) = .039, { r2 } = .039, (r3) = .039, (r4) = .039, (r5) = .039, 

(r6) = .039, (r7) = .192, (r8) = .192, (r9) = .192, (rlO) = .192, 

indicating that the candidates describing the relations on the path from the experi- 
ment back to the TDRSS are most suspect. The paths from the active relations are 
once again evaluated. Based on the new candidate probabilities, however, the most 
useful path is from BCU-CMD to TDRSS-IN, but using a different causal relation 
from BCU-CMD. The path selected goes through the relation rll, using EXP- 
DATA-1, rather than the original EXP-DATA-4. The resulting plan is 


TDRSS-OUT = (mode configure) 
TDRSS-OUT = (write 1, read-1) 
EXP-DATA-1 = 35 
TDRSS-OUT = (mode experiment), 

followed by the prompt 


Is the value of TDRSS-IN equal to 35? 


If the answer to the test is "yes," the probabilities indicate a strong preference for 
a single candidate, indicating that r7 is the faulty mechanism: 

(rl) = .022, (r2) = .022, (r3) = .022, (r4) = .022, (r5) = .022, (r6) = .022, 

(r7) = .543, { r8 } = .109, (r9) = .109, (rlO) = .109. 

If this amount of convergence is sufficient to terminate testing, MPC reports its 
findings. Because R7 is implemented in the RIU module, RIU is reported as the 
suspect module. These results were obtained by using a value of .2 for r in the 
probability equations. With a smaller value, the convergence on the candidate (r7) 
would have been faster. 
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Conclusions 


The test generation architecture implemented in the MPC system contributes a 
new tool to the set of causal reasoning capabilities now available. For systems in 
which few points are accessible, or in which transient effects are important, it pro- 
vides a means to dynamically generate tests in response to observed symptoms. 

The MPC approach is an extension to several other causal-reasoning efforts. Shar- 
ing some of the techniques of (Shirley), it generates tests to narrow down fault 
hypotheses, rather than to test specific components. It makes use of the probabilis- 
tic hypothesis generation and belief ideas of (de Kleer and Williams), but for test- 
generation purposes. This use of probability avoids the need for the "evidence 
model" required in the approach described in (Schaefer). 

The current approach assumes that when a causal relation fails, the physical fail- 
ure is in the device designed to implement the relation. Occasionally, however, an- 
other device may have a failure, such as a short circuit, which interferes to cause 
the relation to fail. Using the model to explore these possibilities, making use of 
"Pathways of Interactions" techniques similar to (Davis), is a topic of ongoing re- 
search. Other extensions include more sophisticated techniques for explaining the 
significance of test results to the user. 
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